Multi Loci Phylogenetic Analysis with Gene Tree Clustering

نویسندگان

  • Ruriko Yoshida
  • Kenji Fukumizu
  • Chrysafis Vogiatzis
چکیده

Summary: Both theory and empirical evidence indicate that phylogenies (trees) of different genes (loci) do not display precisely matched topologies. This phylogenetic incongruence is attributed to the reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic materials in prokaryotes. Nonetheless, most genes do display topologically related phylogenies; this implies they form cohesive subsets (clusters). In this work, we compare popular clustering methods, and show how the performance of the normalized cut framework is efficient and statistically accurate when obtaining clusters on the set of gene trees based on the geodesic distance between them over the Billera-Holmes-Vogtmann (BHV) tree space. We proceed to present a computational study on the performance of different clustering methods with and without preprocessing under different distance metrics and using a series of dimension reduction techniques. Results: First, we show using simulated data that indeed the Ncut framework accurately clusters the set of gene trees given a species tree under the coalescent process. We then depict the success of our framework by comparing its performance to other clustering techniques, including k-means and hierarchical clustering. The main computational results can be summarized to the stellar performance of the Ncut framework even without dimension reduction, the similar performance portrayed by k-means and Ncut under most dimension reduction schemes, the utter failure of hierarchical clustering to accurately capture clusters, as well as the significantly better performance of the NJp method, as compared to MLE. 1 ar X iv :1 50 6. 07 97 6v 2 [ qbi o. PE ] 3 1 D ec 2 01 5

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Genes of Common Evolutionary History.

Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cl...

متن کامل

Study on phylogenetic status of Hari barbel Luciobarbus conocephalus (Kessler, 1872) from Hari river using Cytb gene

Recently, Luciobarbus conocephalus from the Hari River was reported for the first time, but there is doubt about the validity of this species between authors, because some of them placed it as a subspecies or synonym of L. capito. Therefore, the present study was conducted to investigate the status of phylogeny and the validity of this species. For this purpose, specimens captured from Hari Riv...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Evaluation of genetic relationship between 15 bamboo species of North-East India based on ISSR marker analysis

The classification of bamboos based on floral morphology and reproductive characters is very hard due to erratic flowering behavior and unusually long reproductive cycle. The application of reliable and effective DNA molecular markers is highly essential to address this problem. In the present investigation, inter-simple sequence repeats (ISSR) markers were employed to study phylogenetic relati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015